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Abstract 



Q^ _ We propose a general form of community detecting functions for find- 



ing the communities or the optimal partition of a random network, and 
examine the concentration and stability of the function values using the 
bounded difference martingale method. We derive LDP inequalities for 
both the general case and several specific community detecting functions: 
modularity, graph bipartitioning and q-Potts community structure. We 
also discuss the concentration and stability of community detecting func- 
tions on different types of random networks: the sparse and non-sparse 
I | networks and some examples such as ER and CL networks. 

1 Introduction 

m 
| 

One of the main problems in network study is finding the communities or the 
optimal partition for a given network. A standard approach is to design a 
community detecting function of network partitions which achieves its extremum 
when the partition is optimal, or conversely the optimality of the partition is 
defined by a reasonably designed community detecting function. 



There are several important applications of this approach. One is to find the 
community structures of social networks. A well known community detecting 
function for this application is the Modularity 7, proposed by M.E.J. Newman. 
Another application is called circuit partioning in designing a computer system 
[8], in which circuits must be partitioned into groups so that the number of 
signals crossing the partition boundaries is minimized. The community detect- 
ing function used in this application is a variation of the function used in the 
graph bipartitioning problem 5 . In this paper, we propose a general form of 
community detecting functions for which the functions mentioned above and 
many others are considered as specific cases. 



Previous studies show that optimizing the community detecting function is 
in general an NP-complete problem [9] . Algorithms such as simulated annealing 
is developed and tested on specific networks[5]. However, in theoretical studies, 
it is nearly impossible to obtain an exact solution for a single network. What 
we usually have is the average value of the community detecting function over 
a random network ensemble [5J |B]. This gap between the computational and 
theoretical points of views arise the necessity to study the concentration and 
stability of the community detecting functions. 

Roughly speaking, the stability of the community detecting function means 
that the change of the community detecting function is small when the corre- 
sponding network structure undergoes a small perturbation; and the concen- 
tration means that the theoretically predicted average value of the community 
detecting function becomes more precise when the system size grows larger. In 
this paper, we derive LDP type inequalities to illustrate the both aspects: the 
fluctuation of the community detecting function in a fixed network ensemble 
and its asymptotic behavior when the system size goes infinity. 

The concentration and stability of a community detecting function are im- 
portant from the following point of view: 

Firstly, some times it is helpful to consider the given network as a sample 
randomly picked from a designed network ensemble. The ensemble should be 
easy to analyze, catch some important features of the given network such as the 
average degree or the degree distribution, but neglect the detailed structural 
informations by randomization. In this scenario, the concentration and stability 
estimate the departure of the function value of the given network from the 
ensemble average. 

Secondly for real world networks such as the social network and the Internet, 
we have problems like lack of information, uncertainty of the environment, and 
changing of the network by time. The optimization problem on this kind of 
networks is by the nature on an network ensemble. Since this ensemble is usu- 
ally hard to analyze, simulation results of specific networks is in turn taken to 
estimate the ensemble average. In this scenario, the concentration and stability 
come from the problem itself. For example, in the circuit partioning, if we have 
already found an optimal circuit partition under a given signal flow configura- 
tion, the concentration and stability conclude that this partition is still "good" 
if is not optimal under similar configurations. 

Finally the concentration and stability give a measure for how a specific net- 
work deviates from an ensemble. For example, we calculate the modularities for 
a given network G and the ER network ensemble. These two values by them- 
selves do not tell you whether the communities are well defined in G. However, 
the LDP inequality from the concentration gives a bound of the probability 
that a network randomly picked from the ER network ensemble has smaller 
modularity than G by chance. If the probability is small enough, it is statis- 



tically significant that G is not picked from the ER ensemble considering the 
community structure. 

Although not essential to the analysis in this paper, the general form of the 
community detecting function can be physically interpreted as a Hamiltonian 
of a spin system. We study the concentration and stability through a classical 
approach invented by J. Spencer et al [141 : consider an exploring process (edge 
by edge or node by node) of the network, construct a Doob martingale and take 
advantages of martingale inequalities. 

We apply our results to several special community detecting functions: mod- 
ularity, graph bipartitioning and q-potts community structure. We also discuss 
the problems on different classes of networks such as Erdos-Renyi (ER) and 
Chung-Lu (CL) networks. Considering the asymptotic behavior of the number 
of edges when the network size grows, we also classify the network models into 
the sparse and non-sparse networks. We only study the "sparse" network de- 
fined by a constant upper bound of the degree, under which we proved a very 
general concentration result for all these problems. For the non-sparse network 
we derive our results only for specific cases. 



2 Background 

The underlying martingale inequality (Azuma's inequality) can be tracked back 
to Chebyshev inequality. Bernstein inequality [TUJ[TT] named after S.N.Bernstein 
is considered as a modification of Chebyshev inequality and gives a exponential 
decreasing probability upper bound. Then Hoeffding (1963) [12] invents and 
proves the first version of so called "Hoeffding- Chernoff bound" which gives a 
very general probability upper bound for sum of i.i.d.'s. In the same paper, 
Hoeffding also propose a slightly different version to the case where the ran- 
dom variables are not necessarily identical but uniformly bounded, and this 
version is usually called "Hoeffding's inequality". Azuma |T| extends the inde- 
pendent variables in Hoeffding's inequality to martingale differences and obtain 
the "Azuma's inequality". It is a great improvement for us, because the inde- 
pendence between the attendances of edges is usually unavailable in the network 
study. 

W.T.Rhee and M. Talagrand (1987) pd] apply Azuma's inequality to the 
NP hard optimization problems. They give stochastic models for the Bin Pack- 
ing problem and Traveling- Salesman problem as two examples, and assume a 
sequence of optimal solutions of a growing system to construct the martingale. 

Shamir, Spencer (1987) P3] use Azuma's inequality on the problem of chro- 
matic number of random graph. C. McDiarmid's study (1989) |15) summarizes 
this technique on random graph as the method of bounded differences and dis- 
cusses some extensions such as the isoperimetric inequalities. Then several 



statistics of random graphs are studied such as the average distance [3J, con- 
nected component size [4] and number of triangles [16] . More generalized mar- 
tingale inequalities [TTJ [H] and models of random graphs [3J H] are considered 
in these studies. 

Our study combines the above two types of applications of Azuma's inequal- 
ity considering both the optimization problem and the random graph factors. 
Compared with W.T.Rhee and M. Talagrand's work [T3 , the stochastic model 
is replaced by a random network, and instead of an optimal solution sequence of 
increasing system sizes, we consider an arbitrary solution on a fixed but gradu- 
ally uncovered random graph and finally try to find the optimal ones. Shamir, 
Spencer's [14] and C. McDiarmid's [15] works are the most relevant to ours. 
We use almost the same technique in derivation. Other works mentioned above 
all consider some direct statistics of random networks without an optimization 
procedure so is conceptually simpler than ours. 



3 General Model 

The random network model is represented by an ensemble il(N) = {T(N), &}, 
where T(iV) is the collection of all connected graphs with N nodes and £? is a 
probability measure on T(N). G E il(N) is considered as a random network 
taking values in T(N) with probability £?>{G). Its adjacency matrix is given 
by A = [Aij]iy X ]y and the degree of each node is denoted by dt (i — 1, ...,N). 
Additionally we require the probabilities for any two nodes to be linked are 
independent, i.e. {Ay|i < j} are N(N — l)/2 independent random variables. 
A spin vector s = (si, ..., sjv) (si = { — 1, 1}) assigns a spin to each node and 
indicates a partition which takes the nodes with the same spin in one group. 
The community detecting function is given by: 

hG(^) = ~J2F ij (G)s i s j (1) 

i,3 

where {Fij} are functions of the random network G and hence random vari- 
ables. Because the form of community detecting function can be considered as 
the Hamiltonian of a spin system, we also call hc(s) the Hamiltonian in later 
context. Let S be the spin configuration space including all possible spin states 
satisfying given constrains. We take so € S as the optimal configuration of he, 
and 



H{G) = min h G (s) = h G (s ) 
In stead of hc(s), we focus on H(G) because this value only depends on G 



therefore is a property of the network itself rather than some specific configu- 
ration. Furthermore the statistics of H(G) provides information of the network 
ensemble. In this section, we give large deviation results of the H (G) distribu- 
tion. 

Theorem 1 If {-Fy} (i,j = 1,...,N) satisfy that for any two networks 
G,G' £ SI which only differ by one edge, \Fij(G) — Fij(G')\ < c where c is 
a constant independent of i,j, then for every real number t, 



&(\H(G)-<H> a \>t)<2exp(~) (2) 

where < ■ >q is the ensemble average and c 2 is a constant independent oft 
and£l(N). 



To prove theorem 1, we need Azuma's inequality and the following lemma: 

Theorem 2(Azuma) [1 Suppose Yk = Xa=i ^-i * s a martingale, given 
boundedness of each increment \Xi\ < b i7 the ineguality holds for any real number 
t: 

PQYk - E[Y K ]\ >t)< 2exp(- :T ^-2). 



2E,6 : 



i i 



Lemma 1 If the conditions in Theorem 1 hold, then \H(G) — H(G')\ < 



AT" 



Proof of Lemma 1: At first, suppose G and G" only differ at the edge 
between the nodes iq and jq. 

\h G (s)-h G ,(s)\ < ±\F iojo (G) - F iojo (G')\\ So s \ < ± 

Without loss of generality, we assume H(G) > H(G') and H(G) is achieved 
at a specific configuration Sq, that is, /ig( s "o) = H(G) > H(G') > hc'(so)- So 
\H{G)-H(G')\ < \h G (s- )-hG>(sh)\ < #. 



Proof of Theorem 1 Suppose G £ il(N) is a random graph whose node 
pairs are labeled from 1 to N(N — l)/2. Each node pair corresponds to one ran- 
dom variable A k = l {nodc pa ir k is linked} (& = 1, -, N(N - l)/2). Therefore the 
random graph G can be considered as a random process {A\, • •-, AnYa?-:l)/2}> 
and its filtration is F k (k = 0, ..., N(N - l)/2). Define # fc = E[H{G)\F k ] (k = 
1, ..., N(N - l)/2), and H = E[H(G)\F ] =< H > n . tf fe (fc = 0, ..., A(A - l)/2) 
is a martingale by construction. We construct an auxiliary process G' = 
{A[,...,A' N , N _ 1 y 2 } for which Aj = A,- when j ^ k + I and A^ = 1 - A, 
when j = k + 1. G" shares the same filtration with G and represents the graph 



that only differs from G by the link between the node pair k+ 1. The increment 
oiH k 

\Hk+i — Hk\ = \Hk+\ — E[Hk+i\Fk]\ 

= \E[H{G)\F k+1 ]-E[H(G')\F k+1 ]\ 

< E[\H(G) - H(G')\\F k+1 ] 

c 

< — 

~ N 

In the last inequality, \H{G) — H{G')\ as a random variable only depending on 
the realization of G is bounded by -^ according to Lemma 1, so its conditional 
expectation is also controlled by this bound. We apply Azuma's inequality on 
H k , and obtain: 

,n\H{G)- < H >, | > t) < 2ox P (- 2(#)2 ^_ i)/2 ) < 2ex P (-g) [] 

The inequality by itself only indicates some level of stability of H{G) over 
the ensemble 0. To interpret this inequality as a concentration result, instead 
of a single network ensemble, we need a random network model consisting of a 
sequence of network ensembles with different system size N . 

The above general model does not specify the way to generate the ensemble 
sequence when N — > oo, especially the growing speed of the number of edges. 
To keep the networks connected, the number of edges must grow at least linearly 
with respect to the number nodes. We take the network models whose numbers 
of edges grows linearly and super-linearly as the sparse networks and non-sparse 
networks separately. There is no general concentration result for the non-sparse 
networks, so we leave the discussion of this case in the later sections for specific 
community detecting functions. As to the sparse networks, there are various 
ways to define them as long as their numbers of edges grow linearly and we only 
consider one definition, K-bound networks. 



4 K-bound networks 



A given network is called K-bound if the constant K is an upper bound of the 
network degrees. A network ensemble or a random network model is called K- 
bound if if is a uniform upper bound of the degrees over the network ensemble 
or the ensemble sequence. 

A subtle point here is that the K-bound constrain may violate the indepen- 
dence of the links. But since in a random network with independent links, the 
probability distribution of the degree decays very fast (the probability for the 
degree exceed K has the order 0{eT K )), when K increase, the difference be- 
tween the probability measures of the network ensembles with and without the 



K-bound constrain goes to zero very quickly. For K-bound networks, we have 
the concentration result: 

Theorem 3 If fi is a K-bound ensemble, {Fij} (i,j = 1,...,N) satisfy 
that for any two networks G,G' 6 Q which only differ by one edge, \Fij(G) — 
Fij(G')\ < c where c is a constant independent ofi,j, then for every real number 

t, 

Nt 2 
&(\H(G)-<H> n \>t)<2cxp(- M ^) (3) 

Proof of Theorem 3 Consider G G il(N) is a random graph labeling 
the nodes as 1,2, ...,7V. Gk is the subgraph containing nodes from 1 to k and 
also considered as a filtration. Define Hk = E[H(G)\Gk] (k = 1,...,N), and 
Ho = E[H(G)\Go] =< H >q. Hk (k = 0, ..., TV) is a martingale by construction. 
Consider G" is a network only differ from G by one node k+1, ie. all the different 
elements in the adjacency matrix are for the edges linked to node k + 1. G' k ' is 
the corresponding filtration for G. Note that the degree is at most K, therefore 
G and G" can only differ by at most 2K edges. 

-fffe+i — -Hfcl = l-Hfc+i — E[Hk+i\Fk]\ 

= m &x {\E[H(G)\G k ]-E[H(G")\Gm 
< 2K\E[H(G)\G k \-E[H{G')\G k \\ 
2Kc 



< 



N 



Here G' is the same as defined in the proof of Theorem 1. The difference 
H(G) — H(G") can be decomposed into at most 2K terms which all have the 
same bound as H(G) — H(G'). Apply Azuma's inequality, we have: 

t 2 Nt 2 

n\H(G)- < H >n | > t) < 2exp(- 2( ^ )2jv ) < 2exp(-^^) [] 

Remark: According to this inequality, the fluctuation of H{G) about its 
mean has the order 0(N~z). 

When we are given a network Go = {Aoij}, one way to generate a network 
ensemble is to consider the given network undergoes a certain perturbation. 
Suppose the perturbation is denoted by SG(po,pi) where po,Pi are two prob- 
abilities. The network ensemble G = {Aij} generated by Go and SG(po,pi) 
satisfies: 
(a) if Aoij — 1, then A^ — with probability po and A^ = 1 with probability 

1 -Po- 

(b)if Aoij = 0, then A^ = 1 with probability p\ and A^ = with probability 

1-Pi. ' 



To conserve the average degree of the original network Go , we additionally 
require pom = p\( — -^ — - — m), where m is the total number of edges in Go- 
Since m < -^-, the requirement implies: 

Pl * N -"I- K P0 

If Go satisfies the K-bounded degree condition, we have: 

Theorem 4 If O is a K-bound ensemble generated by a given K-bound net- 
work Go and a small perturbation SG, {Fij} (i, j = 1, ..., N) satisfy that for any 
two networks G,G' £ fl which only differ by one edge, \Fij(G) — Fij(G')\ < c 
where c is a constant independent of i,j, then for every real number t, 

Nt 2 
n\H(G)- <H> n \>t)< ^(- A[c2{2K2pi + KpQ)+Kcm ) (4) 

To prove Theorem 4, we need a variance form of Azuma's inequality as 
below: 

Theorem 2' 2 Suppose Yk — X)i=i -%-i is a martingale, given boundedness 
of each increment \Xi\ < M, Var(Xi) — vf the inequality holds for any real 
number t: 

n\YK-E [ Y K] \>t)<2eM- 2{E J +m/3) ) 



Proof of Theorem 4 The proof is almost the same as that for Theorem 3. 
We only need to replace Theorem 2 in that proof by Theorem 2' and find the 
upper bound of Var(Xi). 

Let random variable Li be the number of changes between Go = {^4oy } and 
G = {j4jj} (i < j) related to the node i. Since Li can be considered as a sum 
of 0-1 random variables: 

Var{Li) = Kp (l-p ) + (N-l-K) Pl (l-p 1 ) 

< Kp + (N-1- K) Pl 

< 2K Po 



and 

E[L t ] = Kpo + (N-l- K) Pl < 2K Po 

Since \Xi\ < £*■£, we have: 

2 2 2 

Var{Xi) < ^E[L 2 ] = ^(E 2 [L t ] + Vor(L«)) = ^ ( 4 ^0 + ^p ), 



and Theorem 2' becomes the conclusion of Theorem 4. Compared with Theorem 
3, Theorem 4 gives a much sharper concentration around the average value 
whenever pg is small. jj 

In the following sections, we will prove the concentration for several different 
community detecting functions on both the non-sparse and sparse subclasses of 
Erdos-Renyi (ER) and Chung-Lu (CL) random networks. For all the non-sparse 
cases, we can use Theorem 1 together with a "divide and conquer" technique 
described in the next section to obtain concentration. However, when the com- 
munity detecting function have additional properties like < H(G) >n— O(N) 
as in the graph bipartitioning case, we can scale t in Theorem 1 by N to obtain 
a better concentration inequality without using the " divide and conquer" tech- 
nique. For the K-bounded degree subclasses of ER and CL networks, we will 
use Theorem 3/Theorem 4 along with technical estimates specific to each case. 



5 Modularity 

Modularity is one of several effective criterion for the detection of community 
structures in random networks [7] . In this section, the above theorems are used 
to obtain LDP results on the Modularity functional over the non-sparse Erdos- 
Renyi subclass ER[p, N] and where there is a uniform bound on the degrees of 
G, over the sparse ER[Np,K] subclass (and Chung-Lu CL[N, w,K] subclass 
which properties are given in the next section). The non-sparse cases here 
and in the next section are treated by the "Divide and Conquer" technique 
because, without prior knowledge on properties of their respective ensemble 
average, < H(G) >q, we have no recourse to the shorter method of rescaling t 
in theorem 1 by N. 

In the Modularity problem, Fij(G) — -^{Aij — ^7)7 therefore the Hamil- 
tonian (1) is 

W)-^i>-^ (5) 

Here m is the total number of edges in the network, di is the degree of node i, 
and Aij is the adjacency matrix. The Hamiltonian is more complicated than 
the previous two cases as Fij(G) depends on not only the local information 
A^ but also the global information such as di,m. Suppose G' only have one 
more edge (iojo) than G. To apply Theorems 1 and 3, , we need to estimate 



\Fij(G) - Fij{G')\ as follows, 



— \F t3 {G)-F t3 {G') 



E 



(A 1 A \ ( d>id ' j didj ) 

{A " All) ( 2(m+l) 2m > 



SiSj 



- m \ Aio,Jo ™i ,Jo\ 



A'„ 



^-^ d'^ - didj 
4^ 2(m+l) 



+ <£)(=^- 1)1^)1 



xiT-^)D* d - 



2(m+ 1) 



'#.; 



V + r 



, ^ djdj 



,J 2(m + l) 



(6) 



Next, we estimate the 4 terms in equation (6). The first term \Ai Qt 



A'i • | = 1. The second term: 



ju 



E 



d^d'- — djdj 



2(m+l) 



1 



< 



2(to+1) 

1 



2(m+l) 

< 2 



E «. " *o)dj + E (4 - d Jo) d i + K - d io)(d' jo - d jo ) 

(2m + 2m + f) 



For the third term: 



( 1 


V2(m+1) 
1 


2m(m + f ) 
f 


2m(m + 1) 
2 



Vij-i)E^. 



(2m) 



10 



For the last term, since di,dj < to, 



TO + 1 



A' 



didj 



< ( ) 

- v TO + r 

< max{ 



13 2(to + 1) 

2 

1 



2(to+ 1) 
m 2 1 



2(to + l) 2 ' to + 1 



} 



< 



With the above inequalities, we conclude 

, ,s, 11-W HAT 



(7) 



where to* is the minimum value of to required by connectivity. This completes 
the proof of the technical estimates needed in the application of Theorems 1 
and 3 below. 

For general ER networks without additional non-sparseness or bounded de- 
gree properties and also for sparse ER networks without assumptions on for 
instance the scale of the ensemble average, < H(G) >, Theorem 1 gives: 



\H(G)- <H> Q |>i)<2exp(- 



t 2 



<r 2 (^) 2 



<2exp( -) 



where a = -^ , which shows that the direct application of Theorem 1 alone does 
not allow us to obtain concentration. 

However for sparse ER network with bounded degree, using Theorem 2, we 
prove: 

Theorem 5 In ER[Np = A] and CL[N,w] network ensemble tt with uni- 
form upper bound K of the degree, the optimal modularity H(G) satisfies a 
concentration inequality: 



&>{\H{G)- <H> n \>t)< 2exp(- 



Nt 2 



25/ N \ 2 if2 
2 \m* > 



) < 2exp(- 



Nt 2 , 

25 7^2- 
2 xv 



(8) 



In some sense, the networks with degree upper bound is an extreme case 
of sparse network. Next, we consider the other extreme case, the non-sparse 
networks, using the following "divide and conquer" method. The network is 
non-sparse if: 

(a). < to >q(jv) = pN a /2 (p, a are two constant parameter) 
(b). P(m- < to >q(n)< -Nt) < exp (-4) 



11 



where m is the total number of edges, p is a constant independent of N. The 
meaning of property (b) is shown in the following lemma: 

Lemma 2 If in a network all the edges are independent, there exist a con- 
stant A independent of N, s.t. the number of edges m satisfies 

t 2 
P(m- < m >o(jv)< -Nt) < exp(- — ) 

Proof of Lemma 2: m = J2i<j 1{A;j=i}, where l{Ao=i} € [0, 1] are indepen- 
dent random variables. According to Hocffding inequality j!2j . 

^ r m-<m> . . 2t 2 N 2 (N -l) 2 , 

Substituting t by 2t/(N - 1), we get 

2t 2 N 2 t 2 

0>{m- <m>< -Nt) < exp(- jv ) < exp(-^) 

where X = \. 

An example of non-sparse network is ER network ER(p, N) with constant 
probability p for any two node to be linked. Replacing t in (b) by fiN a ~ 1 /2, 
where fi is independent of N, we get 

£»(m < (p - v)N a /2) < &>(m < (< m > n(JV) -fj,N a /2)) 

< cxp(-n 2 N 2ia -V) (9) 

Finally, we split P(|if (G) — < if >n | > t) into two cases according to m, 
and get the concentration result: 

P(|if(G)- < H > n | > i) 
= P(m < (p - ^)N a /2)P(\H{G)- < if > n | > i|ro < (p - m)^"/2) 
+ P(m > (p - v)N a /2)P(\H(G)- < if > n | > t|m > (p - m)^"/2I10) 

< expC-^iV 2 ^- 1 )) • 2exp(-t 2 ) + 1 -2cxp(-- 1 -) 

< 2exp{-fi 2 N 2 ( a -^ -t 2 ) + 2exp{-t 2 N 2 (p-n) 2 ) (11) 

When 1 < a < 2, the inequality shows concentration. For ER[N,p] network, 
a = 2, we have the following theorem. 

Theorem 6 In non-sparse ER[N,p] network ensemble CI, the optimal mod- 
ularity H(G) satisfies the concentration inequality: 

P(|if(G)- <H> a \>t) <2exp(- / u 2 7V 2 -t 2 ) + 2exp(-t 2 7V 2 (p-/i) 2 ) (12) 



12 



6 Concentration of Modularity on Chung-Lu 
network CL^N.zuJ], CL[N,zu,K] 

Scale free or power law random graphs including Barabasi- Albert (BA), Molloy- 
Reed (MR) and the Chung - Lu (CL) models are frequently encountered in 
the study of random networks that arise in social and ecological problems. For 
brevity, we will use the Chung-Lu model which is based on fixed expected degrees 
sequence vo connected random graphs CL(N,vu) [25] , The Chung-Lu model is 
easier to work with than the Molloy-Reed [22], Newman-Strogatz- Watts [21] 
and Barabasi- Albert [20] formulations because it specifies more information at 
the level of each node j in the graph G(N). It is based on working with subsets 
CL[N,zj] of the scale free random graphs specified by a fixed (deterministic) 
expected degrees sequence of weights 

zu = (wi,...,w N ) 
wj = £[deg(j)]. 

The average degree in CL [N, w] is given by 

JV 

i=i 

In the growth process of CL random networks (cf. BA [20]), a new node Vi 
is added at time i < N and m' random and independent edges are then added 
between this node v-i and those already present. Thus, for node i added, the 
probability of adding a link to node j is Wj/J2i w i- The second order degree 
or average number in the second generation of nodes (where the random graph 
connectedness problem is viewed as a two stage branching process ) is given by 



1 N 



JH= ?>x>? 



whence it is easily shown via an application of Cauchy-Schwartz that 

d{w) - d{w) > 0. 

A key property of CL[N,w] for the proof of concentration of Modularity 
below is the probability for an edge between arbitrary pair of nodes i and j or 
independent random variables A^ = 1 (where A{G) is the adjacency matrix of 
the random graph G) is given by [3j [4] 

Y t{Aij {G{N)) = l} = -Sp-. 

Ei=i w i 

In each subclass ft = CL[N,w], the average number of edges m is 
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/N(N-l) \ 1 A A WiW-j 



2N 2 



1) v-^ dim) ,. T „. 



JV(JV + 1) ^ d{w) 



where the average degree d may depend on N through the weights Wj . 

There arc no known concentration results on the whole class CL[N, w]; this 
class includes sparse random networks that are not uniformly bounded in degree 
of nodes. Using the "divide and conquer" technique and theorem 1 twice, we 
prove LDP results for non-sparse Chung-Lu networks CL^N, za,/3] for which 
in addition to the above properties, the average degree grows with N, that is, 
for /? > and B > 0, both independent of N, 

d(zu) > BN . 

Concentration of Modularity in the subclass of bounded degree Chung-Lu net- 
works CL[N, zu, K] is proved in the previous section. 

Next by Lemma 2, this subclass fl(N) — CL^N, f3] satisfies the non-sparse 
property (b) in : 

(a). < to >o(Af)> MN a , (M > 0, a > 1 are two constant parameters), 
(b) there exists constant A independent of N such that Pr(r7i— < m >q(n)< 
—Nt) < cxp (— jjp) for any real t > . 

Property (a) holds by choosing M = B/2,and a = 1 + j3. Replacing t in (b) 
by /iN 13 gives 

P(m < (M - fi)N f3+1 ) < P(m<(<m>n-^N fs+1 )) 

By conditioning P(\H(G) — < H >n | > t) according to to, we derive the result: 
for constants A, (3 > and [i < M = B/2, that do not depend on N, 

P(\H(G)- < H > CLoo | > t) 
= P(m < (M - v)NP +1 )P {\H{G) - <H>c Loo \>t\m<(M-n)NP +1 )} 
+ P(m >{M- ii)NP +1 )P { \H(G)- < H > CLoo \>t\m>(M - ^N^ 1 )} 



fi 



N 2 \ t 2 ( t 2 (M-n) 2 N 2 P\ 



< exp(-^^).2ex P (- — ) + 1.2ex P ^ J 

< 2cx P (-^-— ) + 2ex P - M 



14 



We used the following applications of theorem 1 : for any real t > 0, 

t 2 
P{\H(G)~ < H> C L 00 \>t\m<(M-n)NP +1 }<2exp{--^) 

2a z 

P(\H(G)- < H> CLoo |>£| m> (Af-^)A^ +1 }<2cxp > ±1 



with respectively, 

2 N(N + l) 
° 2 


25N(N+1) 

32m 2 
25, 3 2 




"" 32 V N-l (N-l) 2 

25 2 
" 16=*' 



and when m 2 > (M - ^) 2 N 2 ^ +1 \ 



.2 



N(N + 1) 25N(N+1) 



< 



< 



32m 2 
25 N(N + 1) 
32 (M - y.fN 2 ^ 2 

1 
(M - y^W 3 ' 



where the constant c > comes from the technical estimate (6) that is valid for 
any pair of random networks G, G' differing by exactly one edge, in any random 
ensemble H. This completes the proof of 

Theorem 7 In non-sparse CLoo[N,B,/3] network ensemble 11, the opti- 
mal modularity H(G) satisfies the concentration inequality: there exists fi > 
independent of N, such that for any real t > 0, 

W,- <*>„!><)< 2e>p(-<^ - il, +2 „ P (_ «■(*/» -^ 

(13) 

where A = 1/2, and a = 5/4. 



7 Graph Bipartitioning on ER and CL Networks 

The so-called graph bipartitioning problem [5] is the simplest example where 
F^G) = A i:i in (1). 

h G (^)=--J2 A i^ i s j (14) 

i,3 
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In this problem, each spin state corresponds to a two group partition of the 
given graph, and the optimum gives the partition with the least intergroup 
links. Fu and Anderson [T5] has shown the equivalence between this problem for 
ER[N, p] network (any two nodes are linked by an edge with probability p > 
where p is independent of N) and the infinite range SK model and derived the 
average solution in the thermodynamic limit. Banavar et al. [IS] investigate 
another case of ER[Np — A] network and give an empirical average solution. 
However, without a concentration result, these solutions are only heuristic. Even 
if we accept the solution in the thermodynamic limit, the errors of the solutions 
applied to a finite system cannot be estimated. Our research complete their 
results. 

In their studies, the constrain of zero magnetization M = J^. s, = is re- 
quired which force an equal size partition. With a non-zero fixed magnetization 
constrain J^. Sj = c 7^ 0, we get the optimal partition with given two group sizes. 
Without any constrain on M, we get the overall optimal partition considering 
all possible group sizes. No matter which constrain we use, it only changes the 
spin configuration space and hence the definition of H{G) — min se s ho{s), so 
we have exactly the same concentration result and proofs in all of these cases. 

For the most general random graph ensemble, we apply Theorem 1 as follows. 
Since 

\F l3 (G) - Fa (G')\ = \An (G) - An (G')\ < 1 (15) 

we have 

&>(\H(G)- < H > n I >t)< 2exp(-t 2 ) 

which shows that concentration is not obtained by this approach. To compare 
with the papers mentioned above, we consider the two types of ER network 
separately. For the first type ER[N,p], we assume in addition, < H >q is of 
the order O(N) which is consistent with the results from the replica method 
[T5] . which allows us to scale t with the same order and have: 

Theorem 8 In a non-sparse ER[N,p] network ensemble Q, the optimal 
graph partitioning H(G) satisfies the concentration inequality: 

P{\H{G)- <H> a \> Nt) < 2exp(-iV 2 t 2 ) 



For the second type ER[Np = A], for which Np is fixed as N — > 00, we 
prove a theorem under the additional assumption, bounded degree. Consider 
the ER[Np — A] network ensemble generated with parameter N,p excluding 
all samples whose maximum degree exceed the bound K > Np, where K is 
independent of N. So when N — >• 00, the degree distribution of the network 
tends to a Poisson distribution with expected degree Np but with a cutoff at 
K . If i^T is large enough, the network ensemble generated like this is almost the 
second type of ER. For this case, we prove use theorem 3 to prove: 
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Theorem 9 In ER[Np — A] ensemble fl with uniform upper bound K of 
the degree, the optimal graph partitioning H{G) satisfies the concentration in- 
equality: 

Nt 2 
3*{\H{G)- < H > n | > t) < 2exp(- — ) 

For comparison with Theorems 8 and 9, the simulated concentration for both 
cases of ER model is shown in the following figures. For the constant p case, the 
fluctuation of H{G)/N is roughly of the order 0(N~ l ). While for the constant 
Np case, the fluctuation of H{G) is roughly of the order 0(N~ X / 2 ). 



As an application of the above theorems to a real world problem, we consider 
the circuit partition optimization problem proposed by S. Kirkpatrick et al [5]. 
The objective function they proposed is equivalent to: 



h G (s) = --^2(A ij /2-X)s i s j 



N 

i,3 

So Fij — Aij — A, where Aij is the number of signals passing between circuits i 
and j , A is a balancing coefficient in optimization [8] . In their study, Aij is no 
longer a — 1 random variable, but as long as the Ay's are uniformly bounded 
which is quite reasonable in their problem, we can always normalize Aij by 
the uniform upper bound and all the theorems and proofs in this section still 
hold. The A introduced in this problem represent a penalty on partitions with 
nonequal groups sizes, and is a special case of that in the modularity function. 



8 Concentration for Q-Potts Community Struc- 
tures on ER and CL networks 



This section is devoted to concentration results for the family of objective func- 
tionals derived from q-Potts models which are introduced as a viable alternative 
to the Modularity community detection algorithm: 

kG{g) = -i^X>^(*> 8 *) + in~ f ^ ^ 



where the spins Sj € (0, 1, ..., q — 1), Ay = 0, 1 is the adjacency matrix of G, 
S(si, Sj) = 1 only if Sj = Sj, otherwise zero, f(n s ) is a function of the occupation 
numbers ri s = (no, n\, ...., n 9 _i), G(N, m) is a random graph with N nodes and 
77i links, J is a ferromagnetic interaction energy, and 7 is a parameter that 
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Figure 1: Each data point is obtained by optimization using simulated annealing 
on 100 realization of ER network. The first figure are for ER with constant 
p = 0.05, and the second figure are for ER with constant pN = 5 
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determines the antiferromagnetic activity of the Hamiltonian. The particular 
example of these Hamiltonians studied by Reichardt [24] is 



hcW 



J 

4m 



JV 



^A^s 



7 y^ n s {n s - 1) 



i¥=3 



2m 



where for 7 > 7*, the optimum of he favors community structures that reflect 
the network topology of G. Threshold 7* is fixed by requiring hcihomogeneous) > 
haidiverse) which for two communities c(m,mi) and c(ri2,m2) can be rewrit- 
ten 

J 7* 

hcihomogeneous) = — - — (mi + rn<i + TO12) 4- - — N(N — 1) 

4m 4m 

= -- — (mi + m 2 + TO12) + -; — [ni(ni - 1) + n 2 (n 2 - 1) + 2riin 2 ] 
4m 4?n 

J 7* 

> -1 — (mi + m 2 ) + -j — [ni(ni - 1) + n 2 (n 2 - 1)] 
4m 4m 

= ha (diverse) 

which is in turn equivalent to the normalized value of the outlink density or 
inter-community links density, 

Jm.12 



2nin 2 



7 , 



since m = mi + m 2 + m,i 2 and N = n\ + n 2 . 

The proofs of LDP over sparse Erdos Renyi random graphs ER(N,p) and 
scale-free Chung-Lu graphs CL OQ (N,w) with expected degrees sequence w will 
be given for arbitrary / in (|16[) since they do not depend on the form of the 
function /. To begin the proof for ER(N,p), based on the Azuma-Hoefding 
inequalities, we define, as before, H(G) = min/i^ = /ig(so) and label the 
optimum s*o. We need a technical estimate required in the application of theorem 
1. 

Lemma 3: There is a constsnt c > independent of N such that for any 
two graphs G(N,m'), G'(N,m = m! + 1) in random ensemble £l(N) which 
differs in exactly one edge, 

\H(G)-H(G')\ <c 
Proof: For a fixed state s — (s±, ..., sat), since the second term / cancel, 

6{si,Sj) 



^f\h G (s)-h G ,(s)\ = 



- h 

< 2, 



e h - A » 



A, 



O\ s ioi s 3o) 
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implying that \h G (s) — h G /(s)\ < j^. A standard argument based on assumimg 
wlog H(G) < H{G'), that is, h G {s ) = iamh G = H{G), h G ,{s ) = mmh G , = 
H(G'), and h G ,{s ) > mmh G , = H(G'), so h G (s ) = H(G) < H(G') < h G ,{s ), 
implies the result, 

\H(G)-H(G')\ < \h G (s )-h G ,(s Q )\ 

J 

< —. 
~ 2 

The rest of the proof is exactly the same as in previous sections - using the 
Divide and Conquer technique based on the non-sparse property which is valid 
for ER[p, N] and CLoo[N,B,^] and theorem 1 together with this lemma, we 
prove: 

Theorem 10 In the non-sparse ER[N,p] and CL^N, B,/3] ensembles Q, 
the optimal Q-Potts functional H(G) satisfies the concentration inequalities re- 
spectively: there exists /i > independent of N such that for real t > 0, 



P(\H(G)- <H> n \>t) <2cxp(-^ 2 7V 2 -t 2 ) + 2cxp(-t 2 N 2 (p-fi) 2 



P{\H{G)- < H > n | > t) < 2exp(-^— — - - T )+2exp 



H 2 N 2 ? t\ in ( t 2 (B/2-n) 2 N 2 P 

2\ 2 ~2c^ 



with A = 1/2 and a = J 12. 

Using Lemma 3 and Theorem 3, we prove: 

Theorem 11 In ER[Np,K] and CL[N, w, K] ensembles SI with uniform 
upper bound K on the degree which is independent of N , the optimal Q-Potts 
functional H{G) satisfies the concentration inequality: for any real t > 0, 



H(G)- <H> n \ >t)<2exp(- 



Nt 2 , 
'W 2 ' 



9 Conclusion 



In this paper, we derive LDP type inequalities for the optimal values of com- 
munity detecting functions on random networks to show its concentration and 
stability. There is no concentration for the most general case. We prove the 
concentration of the general community detecting function on K-bound sparse 
network and an even sharper concentration when the network ensemble is gen- 
erated by a given network and a small perturbation. Then we examine several 
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specific cases. The three specific community detecting functions we considered 
are: modularity, graph bipartioning and q-potts community structure. The spe- 
cific network types we considered are ER and CL networks, and each of them 
with sparse(K-bound) and non-sparse cases. We prove concentration in these 
cases, that means in these cases the community detecting functions are stable 
especially when the system size is large enough. 

This work was supported in part by the Army Research Laboratory under 
Cooperative Agreement Number W911NF-09- 2-0053, and the Army Research 
Office Grant No. W911NF-09-1-0254. The views and conclusions contained in 
this document are those of the authors and should not be interpreted as repre- 
senting the official policies, either expressed or implied, of the Army Research 
Laboratory or the U.S. Government. 
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